Project 4: Predicting PM2.5 Levels from Fire Intensity Data¶

Rupert Williams and Vince Caristo

Hypothesis: Summertime PM2.5 concentrations in the Western United States can be reliably estimated by applying a linear regression model to fire radiative power data from the preceding two weeks¶

Sources:¶

  • MERRA-2 (PM2.5): https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/
  • VIIRS NOAA-20 (Fire Intensity): https://firms.modaps.eosdis.nasa.gov/download/
InĀ [1]:
import xarray as xr
import rioxarray as xrio
import earthaccess
import boto3
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import warnings
from IPython.display import display, Markdown
import numpy as np
from pathlib import Path
import netCDF4
import geopandas as gpd
Path.cwd()
import matplotlib.gridspec as gridspec
from datetime import datetime, timedelta

warnings.filterwarnings('ignore')
%matplotlib inline

1. Obtain MERRA-2 Data (PM2.5)¶

  • use the EarthAccess API
  • get weekly samples for an entire summer season
InĀ [2]:
if (boto3.client('s3').meta.region_name == 'us-west-2'):
    display(Markdown('### us-west-2 Region Check: ✅'))
else:
    display(Markdown('### us-west-2 Region Check: ❌'))
    raise ValueError('Your notebook is not running inside the AWS us-west-2 region, and will not be able to directly access NASA Earthdata S')

us-west-2 Region Check: āœ…Ā¶

InĀ [3]:
# Authenticate using Earthdata Login prerequisite files
auth = earthaccess.login()

Identify the appropriate "granule" from a GES-DISC web search (https://disc.gsfc.nasa.gov):¶

MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration¶

  • https://disc.gsfc.nasa.gov/datasets/MERRA2_CNN_HAQAST_PM25_1/summary
  • 2000-01-01 to 2024-06-
  • C3094710982-GES_DISC
  • s3://gesdisc-cumulus-prod-protected/HAQAST/MERRA2_CNN_HAQAST_PM25.1/

Data points: 27 sundays in 2022¶

InĀ [4]:
from datetime import datetime, timedelta

def get_sundays(start_date, end_date):
    """
    Return a list of Sundays within the given date interval.
    
    Parameters:
        start_date (str): The start date in "YYYY-MM-DD" format.
        end_date (str): The end date in "YYYY-MM-DD" format.
    
    Returns:
        list of str: List of Sundays in "YYYY-MM-DD" format.
    """
    # Convert start and end dates to datetime objects
    start = datetime.strptime(start_date, "%Y-%m-%d")
    end = datetime.strptime(end_date, "%Y-%m-%d")
    
    # Find the first Sunday on or after the start date
    current = start + timedelta(days=(6 - start.weekday()) % 7)
    
    # Generate all Sundays within the interval
    sundays = []
    while current <= end:
        sundays.append(current.strftime("%Y-%m-%d"))
        current += timedelta(days=7)  # Move to the next Sunday
    
    return sundays

# Example usage
start_date = "2022-05-01"
end_date = "2022-10-30"
sundays = get_sundays(start_date, end_date)
print(sundays)
['2022-05-01', '2022-05-08', '2022-05-15', '2022-05-22', '2022-05-29', '2022-06-05', '2022-06-12', '2022-06-19', '2022-06-26', '2022-07-03', '2022-07-10', '2022-07-17', '2022-07-24', '2022-07-31', '2022-08-07', '2022-08-14', '2022-08-21', '2022-08-28', '2022-09-04', '2022-09-11', '2022-09-18', '2022-09-25', '2022-10-02', '2022-10-09', '2022-10-16', '2022-10-23', '2022-10-30']
InĀ [5]:
Path.cwd() /'merra_data'
Out[5]:
PosixPath('/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data')

Download the data (to disk, or directly to memory)¶

  • one file per day
InĀ [Ā ]:
# option 1: download files to disk
for sunday in sundays:
    # Search for the granule by DOI
    results = earthaccess.search_data(
    concept_id="C3094710982-GES_DISC", temporal=(sunday,sunday))
    files = earthaccess.download(results, Path.cwd() /'merra_data')
  • if downloading to disk, generate list of file pointers
InĀ [7]:
# if downloading to disk, filter out uneccessary files

import os

def get_matching_files(folder_path, sundays):
    """
    Return a list of files in the folder that match the given list of Sunday dates.
    
    Parameters:
        folder_path (str): Path to the folder containing the files.
        sundays (list of str): List of Sundays in "YYYYMMDD" format to match.
        
    Returns:
        list of str: List of file paths matching the dates.
    """
    # Format Sundays in the "YYYYMMDD" format for matching
    sunday_strings = [date.replace("-", "") for date in sundays]
    
    # Get all files in the folder
    files = os.listdir(folder_path)
    
    # Filter files that contain a matching date
    matching_files = [
        os.path.join(folder_path, f)
        for f in files
        if any(date in f for date in sunday_strings)
    ]
    
    return matching_files

files = get_matching_files(Path.cwd() /'merra_data', sundays)
print(files)
['/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220522.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220508.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220501.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220515.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220529.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220605.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220612.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220619.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220626.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220703.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220710.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220717.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220724.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220731.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220807.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220814.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220821.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220828.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220904.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220911.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220918.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20220925.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20221002.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20221009.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20221016.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20221023.nc4', '/var/home/rupertwilliams/Homes/gis/PyCharmProjects/GIS/Assignment 4/merra2_linReg/merra_data/MERRA2_HAQAST_CNN_L4_V1.20221030.nc4']

2. Regrid the PM2.5 data¶

  • Western CONUS
  • meters

setup projection, bounds of data¶

InĀ [8]:
lon_min, lon_max = -140, -90
lat_min, lat_max = 35, 55

# a simple Albers Equal Area projection 
crs_proj4 = "+proj=aea +lat_1=42 +lat_2=48 +lat_0=45 +lon_0=-115 +datum=WGS84 +units=m +no_defs"

x_min, x_max = -1500000, 1500000
y_min, y_max = -750000, 1000000
resolution = 5000 # meters

# Ensure bounds are divisible by the resolution
x_range = x_max - x_min
y_range = y_max - y_min

x_steps = x_range // resolution
y_steps = y_range // resolution

# Corrected bounds
x_max = x_min + (x_steps * resolution)
y_max = y_min + (y_steps * resolution)

dst_bounds = (x_min, x_max, y_min, y_max)
  • for each file, a function to extract a slice in time (12:30 pm) and space (bounds in previous cell)
InĀ [9]:
from rasterio.enums import Resampling

def process_dataset(file):
    ds = xr.open_mfdataset(file, combine='by_coords', engine='netcdf4', decode_coords='all')
    ds = ds.rio.write_crs("EPSG:4326", inplace=True)
    
    ds = ds.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)
    ds_time_filtered = ds.sel(time=ds.time.dt.strftime('%H:%M:%S') == '12:30:00')

    # Filter spatial bounds
    ds_filtered = ds_time_filtered.sel(lon=slice(lon_min, lon_max), lat=slice(lat_min, lat_max))
    ds_filtered = ds_filtered.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)
    
    ds_reprojected = ds_filtered.rio.reproject(
        crs_proj4,
        dst_bounds=dst_bounds,
        resolution=(resolution, resolution),
        resampling=Resampling.bilinear
    )       
    
    ds_reprojected = ds_reprojected.sel(x=slice(x_min, x_max), y=slice(y_max, y_min))
    ds_reprojected = ds_reprojected.rio.write_crs(crs_proj4, inplace=True)
    
    return ds_reprojected
InĀ [10]:
datasets = [process_dataset(file) for file in files]

combined_ds = xr.concat(datasets, dim='time')

We have 27 arrays, each on a 350 x 600 grid¶

InĀ [11]:
combined_ds
Out[11]:
<xarray.Dataset> Size: 68MB
Dimensions:                  (time: 27, y: 350, x: 600)
Coordinates:
  * x                        (x) float64 5kB -1.497e+06 -1.492e+06 ... 1.498e+06
  * y                        (y) float64 3kB 9.959e+05 9.909e+05 ... -7.491e+05
  * time                     (time) datetime64[ns] 216B 2022-05-22T12:30:00 ....
    spatial_ref              int64 8B 0
Data variables:
    MERRA2_CNN_Surface_PM25  (time, y, x) float32 23MB 2.803 2.917 ... 10.38
    QFLAG                    (time, y, x) float64 45MB 2.0 2.0 2.0 ... 4.0 4.0
Attributes: (12/33)
    Comment:                           filename: MERRA2_HAQAST_CNN_L4_V1_2022...
    Filename:                          MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions:                       CF-1
    Institution:                       NASA Goddard Space Flight Center
    References:                        http://gmao.gsfc.nasa.gov ; https://do...
    Format:                            NetCDF-4/HDF-5
    ...                                ...
    RangeEndingDate:                   2022-05-22
    RangeEndingTime:                   23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension:    time
    history:                           2023-04-21 04:24:38 GMT Hyrax-1.16.3 h...
    ProcessingLevel:                   4
    MapProjection:                     Geographic lat/lon. Datum: WGS-84
xarray.Dataset
    • time: 27
    • y: 350
    • x: 600
    • x
      (x)
      float64
      -1.497e+06 -1.492e+06 ... 1.498e+06
      axis :
      X
      long_name :
      x coordinate of projection
      standard_name :
      projection_x_coordinate
      units :
      metre
      array([-1496524.326433, -1491524.326433, -1486524.326433, ...,  1488475.673567,
              1493475.673567,  1498475.673567])
    • y
      (y)
      float64
      9.959e+05 9.909e+05 ... -7.491e+05
      axis :
      Y
      long_name :
      y coordinate of projection
      standard_name :
      projection_y_coordinate
      units :
      metre
      array([ 995893.546327,  990893.546327,  985893.546327, ..., -739106.453673,
             -744106.453673, -749106.453673])
    • time
      (time)
      datetime64[ns]
      2022-05-22T12:30:00 ... 2022-10-...
      long_name :
      time
      array(['2022-05-22T12:30:00.000000000', '2022-05-08T12:30:00.000000000',
             '2022-05-01T12:30:00.000000000', '2022-05-15T12:30:00.000000000',
             '2022-05-29T12:30:00.000000000', '2022-06-05T12:30:00.000000000',
             '2022-06-12T12:30:00.000000000', '2022-06-19T12:30:00.000000000',
             '2022-06-26T12:30:00.000000000', '2022-07-03T12:30:00.000000000',
             '2022-07-10T12:30:00.000000000', '2022-07-17T12:30:00.000000000',
             '2022-07-24T12:30:00.000000000', '2022-07-31T12:30:00.000000000',
             '2022-08-07T12:30:00.000000000', '2022-08-14T12:30:00.000000000',
             '2022-08-21T12:30:00.000000000', '2022-08-28T12:30:00.000000000',
             '2022-09-04T12:30:00.000000000', '2022-09-11T12:30:00.000000000',
             '2022-09-18T12:30:00.000000000', '2022-09-25T12:30:00.000000000',
             '2022-10-02T12:30:00.000000000', '2022-10-09T12:30:00.000000000',
             '2022-10-16T12:30:00.000000000', '2022-10-23T12:30:00.000000000',
             '2022-10-30T12:30:00.000000000'], dtype='datetime64[ns]')
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      unknown
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      unknown
      grid_mapping_name :
      albers_conical_equal_area
      standard_parallel :
      (42.0, 48.0)
      latitude_of_projection_origin :
      45.0
      longitude_of_central_meridian :
      -115.0
      false_easting :
      0.0
      false_northing :
      0.0
      spatial_ref :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      GeoTransform :
      -2309024.326432758 5000.0 0.0 1388393.546326695 0.0 -5000.0
      array(0)
    • MERRA2_CNN_Surface_PM25
      (time, y, x)
      float32
      2.803 2.917 2.855 ... 10.31 10.38
      units :
      ug m-3
      long_name :
      Bias corrected surface total PM2.5 mass concentration
      array([[[2.80319929e+00, 2.91730952e+00, 2.85547781e+00, ...,
               1.24593735e-05, 9.45536813e-05, 2.72322621e-04],
              [2.87739038e+00, 2.98492169e+00, 2.88842416e+00, ...,
               1.42758108e-05, 5.31091937e-05, 2.49106961e-04],
              [2.96227264e+00, 3.01779151e+00, 2.92145991e+00, ...,
               1.13591559e-05, 5.70225609e-07, 2.14771106e-04],
              ...,
              [8.20281133e-02, 1.00274779e-01, 1.19042166e-01, ...,
               1.13938737e+00, 1.02916467e+00, 9.23335314e-01],
              [1.06056236e-01, 1.23177528e-01, 1.40818924e-01, ...,
               1.15250039e+00, 1.05170274e+00, 9.55293298e-01],
              [1.29588157e-01, 1.45585299e-01, 1.62101895e-01, ...,
               1.16146147e+00, 1.07007885e+00, 9.83079076e-01]],
      
             [[1.88523643e-02, 1.76367667e-02, 1.69822145e-02, ...,
               1.05465078e+00, 1.03993058e+00, 1.06375015e+00],
              [1.87412091e-02, 1.76020432e-02, 1.70893967e-02, ...,
               1.07569540e+00, 1.03132951e+00, 1.05612016e+00],
              [1.86296050e-02, 1.76992510e-02, 1.72118116e-02, ...,
               1.10119867e+00, 1.02213800e+00, 1.04789841e+00],
      ...
              [3.79566103e-01, 3.65422308e-01, 3.50830168e-01, ...,
               5.56936026e+00, 5.67352676e+00, 5.77631760e+00],
              [3.71755898e-01, 3.58579844e-01, 3.44955981e-01, ...,
               5.51105738e+00, 5.61222744e+00, 5.71202326e+00],
              [3.64373237e-01, 3.52163911e-01, 3.39507282e-01, ...,
               5.45407152e+00, 5.55224848e+00, 5.64905310e+00]],
      
             [[7.90751982e+00, 7.91771650e+00, 7.92481327e+00, ...,
               8.53381634e-01, 8.68257940e-01, 8.91305208e-01],
              [7.87675524e+00, 7.88557339e+00, 7.89231062e+00, ...,
               8.62468064e-01, 8.74697447e-01, 9.00063932e-01],
              [7.84532595e+00, 7.85294247e+00, 7.86002302e+00, ...,
               8.70708168e-01, 8.79723012e-01, 9.07405496e-01],
              ...,
              [3.86608690e-01, 3.52564454e-01, 3.20915490e-01, ...,
               1.04376144e+01, 1.05251646e+01, 1.06089659e+01],
              [4.20164615e-01, 3.81019294e-01, 3.44266385e-01, ...,
               1.03375454e+01, 1.04170561e+01, 1.04928226e+01],
              [4.51468110e-01, 4.07227188e-01, 3.65375817e-01, ...,
               1.02410116e+01, 1.03124924e+01, 1.03802328e+01]]], dtype=float32)
    • QFLAG
      (time, y, x)
      float64
      2.0 2.0 2.0 2.0 ... 4.0 4.0 4.0 4.0
      FLAG 1 :
      Low quality
      FLAG 2 :
      Medium quality over ocean
      FLAG 3 :
      High quality
      FLAG 4 :
      Highest quality
      Recommendation :
      qflag recommnendation -3 & 4 for quantitative analysis
      long_name :
      Quality Flag for data quality at grid
      units :
      unitless
      array([[[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
      ...
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]]])
    • x
      PandasIndex
      PandasIndex(Index([-1496524.326432758, -1491524.326432758, -1486524.326432758,
             -1481524.326432758, -1476524.326432758, -1471524.326432758,
             -1466524.326432758, -1461524.326432758, -1456524.326432758,
             -1451524.326432758,
             ...
              1453475.673567242,  1458475.673567242,  1463475.673567242,
              1468475.673567242,  1473475.673567242,  1478475.673567242,
              1483475.673567242,  1488475.673567242,  1493475.673567242,
              1498475.673567242],
            dtype='float64', name='x', length=600))
    • y
      PandasIndex
      PandasIndex(Index([ 995893.546326695,  990893.546326695,  985893.546326695,
              980893.546326695,  975893.546326695,  970893.546326695,
              965893.546326695,  960893.546326695,  955893.546326695,
              950893.546326695,
             ...
             -704106.453673305, -709106.453673305, -714106.453673305,
             -719106.453673305, -724106.453673305, -729106.453673305,
             -734106.453673305, -739106.453673305, -744106.453673305,
             -749106.453673305],
            dtype='float64', name='y', length=350))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2022-05-22 12:30:00', '2022-05-08 12:30:00',
                     '2022-05-01 12:30:00', '2022-05-15 12:30:00',
                     '2022-05-29 12:30:00', '2022-06-05 12:30:00',
                     '2022-06-12 12:30:00', '2022-06-19 12:30:00',
                     '2022-06-26 12:30:00', '2022-07-03 12:30:00',
                     '2022-07-10 12:30:00', '2022-07-17 12:30:00',
                     '2022-07-24 12:30:00', '2022-07-31 12:30:00',
                     '2022-08-07 12:30:00', '2022-08-14 12:30:00',
                     '2022-08-21 12:30:00', '2022-08-28 12:30:00',
                     '2022-09-04 12:30:00', '2022-09-11 12:30:00',
                     '2022-09-18 12:30:00', '2022-09-25 12:30:00',
                     '2022-10-02 12:30:00', '2022-10-09 12:30:00',
                     '2022-10-16 12:30:00', '2022-10-23 12:30:00',
                     '2022-10-30 12:30:00'],
                    dtype='datetime64[ns]', name='time', freq=None))
  • Comment :
    filename: MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Filename :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions :
    CF-1
    Institution :
    NASA Goddard Space Flight Center
    References :
    http://gmao.gsfc.nasa.gov ; https://doi.org/10.4209/aaqr.210105 ; https://doi.org/10.1029/2022EA002375
    Format :
    NetCDF-4/HDF-5
    SpatialCoverage :
    global
    VersionID :
    V1
    TemporalRange :
    2000-01-01 -> 2024-12-31
    identifier_product_doi_authority :
    http://dx.doi.org/
    ShortName :
    MERRA2_CNN_HAQAST_PM25
    GranuleID :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    ProductionDateTime :
    2024-06-18T06:44:33.646033Z
    LongName :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    title :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    SouthernmostLatitude :
    -90.0
    NorthernmostLatitude :
    90.0
    WesternmostLongitude :
    -180.0
    EasternmostLongitude :
    179.375
    LatitudeResolution :
    0.5
    LongitudeResolution :
    0.625
    DataResolution :
    0.5 x 0.625
    Source :
    M2T1NXAER_5.12.4: BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    Contact :
    Pawan Gupta (pawan.gupta@nasa.gov), and Alqamah Sayeed (alqamah.sayeed@nasa.gov)
    identifier_product_doi :
    10.5067/OCKK5HCFW5N3
    RangeBeginningDate :
    2022-05-22
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2022-05-22
    RangeEndingTime :
    23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension :
    time
    history :
    2023-04-21 04:24:38 GMT Hyrax-1.16.3 https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXAER.5.12.4/2022/12/MERRA2_400.tavg1_2d_aer_Nx.20221231.nc4.nc4?BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    ProcessingLevel :
    4
    MapProjection :
    Geographic lat/lon. Datum: WGS-84
InĀ [12]:
combined_ds.MERRA2_CNN_Surface_PM25[0]
Out[12]:
<xarray.DataArray 'MERRA2_CNN_Surface_PM25' (y: 350, x: 600)> Size: 840kB
array([[2.80319929e+00, 2.91730952e+00, 2.85547781e+00, ...,
        1.24593735e-05, 9.45536813e-05, 2.72322621e-04],
       [2.87739038e+00, 2.98492169e+00, 2.88842416e+00, ...,
        1.42758108e-05, 5.31091937e-05, 2.49106961e-04],
       [2.96227264e+00, 3.01779151e+00, 2.92145991e+00, ...,
        1.13591559e-05, 5.70225609e-07, 2.14771106e-04],
       ...,
       [8.20281133e-02, 1.00274779e-01, 1.19042166e-01, ...,
        1.13938737e+00, 1.02916467e+00, 9.23335314e-01],
       [1.06056236e-01, 1.23177528e-01, 1.40818924e-01, ...,
        1.15250039e+00, 1.05170274e+00, 9.55293298e-01],
       [1.29588157e-01, 1.45585299e-01, 1.62101895e-01, ...,
        1.16146147e+00, 1.07007885e+00, 9.83079076e-01]], dtype=float32)
Coordinates:
  * x            (x) float64 5kB -1.497e+06 -1.492e+06 ... 1.493e+06 1.498e+06
  * y            (y) float64 3kB 9.959e+05 9.909e+05 ... -7.441e+05 -7.491e+05
    time         datetime64[ns] 8B 2022-05-22T12:30:00
    spatial_ref  int64 8B 0
Attributes:
    units:      ug m-3
    long_name:  Bias corrected surface total PM2.5 mass concentration
xarray.DataArray
'MERRA2_CNN_Surface_PM25'
  • y: 350
  • x: 600
  • 2.803 2.917 2.855 2.759 2.662 2.565 ... 1.357 1.257 1.161 1.07 0.9831
    array([[2.80319929e+00, 2.91730952e+00, 2.85547781e+00, ...,
            1.24593735e-05, 9.45536813e-05, 2.72322621e-04],
           [2.87739038e+00, 2.98492169e+00, 2.88842416e+00, ...,
            1.42758108e-05, 5.31091937e-05, 2.49106961e-04],
           [2.96227264e+00, 3.01779151e+00, 2.92145991e+00, ...,
            1.13591559e-05, 5.70225609e-07, 2.14771106e-04],
           ...,
           [8.20281133e-02, 1.00274779e-01, 1.19042166e-01, ...,
            1.13938737e+00, 1.02916467e+00, 9.23335314e-01],
           [1.06056236e-01, 1.23177528e-01, 1.40818924e-01, ...,
            1.15250039e+00, 1.05170274e+00, 9.55293298e-01],
           [1.29588157e-01, 1.45585299e-01, 1.62101895e-01, ...,
            1.16146147e+00, 1.07007885e+00, 9.83079076e-01]], dtype=float32)
    • x
      (x)
      float64
      -1.497e+06 -1.492e+06 ... 1.498e+06
      axis :
      X
      long_name :
      x coordinate of projection
      standard_name :
      projection_x_coordinate
      units :
      metre
      array([-1496524.326433, -1491524.326433, -1486524.326433, ...,  1488475.673567,
              1493475.673567,  1498475.673567])
    • y
      (y)
      float64
      9.959e+05 9.909e+05 ... -7.491e+05
      axis :
      Y
      long_name :
      y coordinate of projection
      standard_name :
      projection_y_coordinate
      units :
      metre
      array([ 995893.546327,  990893.546327,  985893.546327, ..., -739106.453673,
             -744106.453673, -749106.453673])
    • time
      ()
      datetime64[ns]
      2022-05-22T12:30:00
      long_name :
      time
      array('2022-05-22T12:30:00.000000000', dtype='datetime64[ns]')
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      unknown
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      unknown
      grid_mapping_name :
      albers_conical_equal_area
      standard_parallel :
      (42.0, 48.0)
      latitude_of_projection_origin :
      45.0
      longitude_of_central_meridian :
      -115.0
      false_easting :
      0.0
      false_northing :
      0.0
      spatial_ref :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      GeoTransform :
      -2309024.326432758 5000.0 0.0 1388393.546326695 0.0 -5000.0
      array(0)
    • x
      PandasIndex
      PandasIndex(Index([-1496524.326432758, -1491524.326432758, -1486524.326432758,
             -1481524.326432758, -1476524.326432758, -1471524.326432758,
             -1466524.326432758, -1461524.326432758, -1456524.326432758,
             -1451524.326432758,
             ...
              1453475.673567242,  1458475.673567242,  1463475.673567242,
              1468475.673567242,  1473475.673567242,  1478475.673567242,
              1483475.673567242,  1488475.673567242,  1493475.673567242,
              1498475.673567242],
            dtype='float64', name='x', length=600))
    • y
      PandasIndex
      PandasIndex(Index([ 995893.546326695,  990893.546326695,  985893.546326695,
              980893.546326695,  975893.546326695,  970893.546326695,
              965893.546326695,  960893.546326695,  955893.546326695,
              950893.546326695,
             ...
             -704106.453673305, -709106.453673305, -714106.453673305,
             -719106.453673305, -724106.453673305, -729106.453673305,
             -734106.453673305, -739106.453673305, -744106.453673305,
             -749106.453673305],
            dtype='float64', name='y', length=350))
  • units :
    ug m-3
    long_name :
    Bias corrected surface total PM2.5 mass concentration

3. Get and process VIIRS data¶

  • point data (04/01/22 tp 11/01/22)
  • aggregate in 2-week chunks
  • produce 2d histograms on the same grid as above
  • smooth with a Gaussian kernel and convolution
InĀ [13]:
# Fire vector data
viirs = Path("viirs_data/DL_FIRE_J1V-C2_550835_040122_110122")
viirs.exists()
Out[13]:
True

use geopandas to match the projection and bounds from Step 2¶

InĀ [14]:
viirs_data = gpd.read_file(viirs, bbox=(lon_min,lat_min,lon_max,lat_max))
viirs_data = viirs_data.to_crs(crs_proj4)
InĀ [15]:
ends = [datetime.utcfromtimestamp(time.astype('O') / 1e9) for time in combined_ds.time.values]
starts = [end - timedelta(weeks=2) for end in ends]
periods = [f for f in zip(starts, ends)]
print(periods)
[(datetime.datetime(2022, 5, 8, 12, 30), datetime.datetime(2022, 5, 22, 12, 30)), (datetime.datetime(2022, 4, 24, 12, 30), datetime.datetime(2022, 5, 8, 12, 30)), (datetime.datetime(2022, 4, 17, 12, 30), datetime.datetime(2022, 5, 1, 12, 30)), (datetime.datetime(2022, 5, 1, 12, 30), datetime.datetime(2022, 5, 15, 12, 30)), (datetime.datetime(2022, 5, 15, 12, 30), datetime.datetime(2022, 5, 29, 12, 30)), (datetime.datetime(2022, 5, 22, 12, 30), datetime.datetime(2022, 6, 5, 12, 30)), (datetime.datetime(2022, 5, 29, 12, 30), datetime.datetime(2022, 6, 12, 12, 30)), (datetime.datetime(2022, 6, 5, 12, 30), datetime.datetime(2022, 6, 19, 12, 30)), (datetime.datetime(2022, 6, 12, 12, 30), datetime.datetime(2022, 6, 26, 12, 30)), (datetime.datetime(2022, 6, 19, 12, 30), datetime.datetime(2022, 7, 3, 12, 30)), (datetime.datetime(2022, 6, 26, 12, 30), datetime.datetime(2022, 7, 10, 12, 30)), (datetime.datetime(2022, 7, 3, 12, 30), datetime.datetime(2022, 7, 17, 12, 30)), (datetime.datetime(2022, 7, 10, 12, 30), datetime.datetime(2022, 7, 24, 12, 30)), (datetime.datetime(2022, 7, 17, 12, 30), datetime.datetime(2022, 7, 31, 12, 30)), (datetime.datetime(2022, 7, 24, 12, 30), datetime.datetime(2022, 8, 7, 12, 30)), (datetime.datetime(2022, 7, 31, 12, 30), datetime.datetime(2022, 8, 14, 12, 30)), (datetime.datetime(2022, 8, 7, 12, 30), datetime.datetime(2022, 8, 21, 12, 30)), (datetime.datetime(2022, 8, 14, 12, 30), datetime.datetime(2022, 8, 28, 12, 30)), (datetime.datetime(2022, 8, 21, 12, 30), datetime.datetime(2022, 9, 4, 12, 30)), (datetime.datetime(2022, 8, 28, 12, 30), datetime.datetime(2022, 9, 11, 12, 30)), (datetime.datetime(2022, 9, 4, 12, 30), datetime.datetime(2022, 9, 18, 12, 30)), (datetime.datetime(2022, 9, 11, 12, 30), datetime.datetime(2022, 9, 25, 12, 30)), (datetime.datetime(2022, 9, 18, 12, 30), datetime.datetime(2022, 10, 2, 12, 30)), (datetime.datetime(2022, 9, 25, 12, 30), datetime.datetime(2022, 10, 9, 12, 30)), (datetime.datetime(2022, 10, 2, 12, 30), datetime.datetime(2022, 10, 16, 12, 30)), (datetime.datetime(2022, 10, 9, 12, 30), datetime.datetime(2022, 10, 23, 12, 30)), (datetime.datetime(2022, 10, 16, 12, 30), datetime.datetime(2022, 10, 30, 12, 30))]

Convert point data to raster¶

  • for each 2-week period, calculate a 2d histogram (on the same grid as in Step 2)

  • next, use a kernel and a convolution to map each pixel in the 2d histogram to some weighted combination of its neighbor's values

    • note, this is similar to the concepts of spatial weights (kernel) and spatial lag (convolutions)

    • in our case, the output resolution = input resolution

"plain" convolution transposed convolution dilated convolution

from https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

use a "gaussian kernel" to determine the weights of your kernel pixels¶

  • heavily weight values close the center of the kernel
https://zbigatron.com/wp-content/uploads/2018/04/gaussiankernel.jpg No description has been provided for this image

from https://kandi.openweaver.com/javascript/aaron9000/gaussian-kernel-calculator

InĀ [16]:
bandwidth = 50000 # meters

# Compute the standard deviation in terms of grid cells
sigma = bandwidth / resolution

from scipy.signal import fftconvolve

def gaussian_kernel(sigma):
    """Creates an anisotropic Gaussian kernel."""
    # Define kernel sizes
    kernel_size = int(6 * sigma)
    x = np.linspace(-kernel_size // 2, kernel_size // 2, kernel_size)
    y = np.linspace(-kernel_size // 2, kernel_size // 2, kernel_size)
    x, y = np.meshgrid(x, y)
    kernel = np.exp(-(x**2 / (2 * sigma**2) + y**2 / (2 * sigma**2)))
    return kernel

# Create the kernel
kernel = gaussian_kernel(sigma)

What does our gaussian kernel look like?¶

InĀ [17]:
plt.figure(figsize=(4,4))
plt.imshow(kernel, cmap='viridis')
plt.title("Gaussian Kernel")
plt.colorbar()
plt.show()
No description has been provided for this image
InĀ [19]:
def create_fire_signal(start, end):
    filter = (viirs_data.ACQ_DATE >= start.strftime("%Y-%m-%d")) & (viirs_data.ACQ_DATE < end.strftime("%Y-%m-%d"))
    filtered_viirs = viirs_data[filter]
    weights = filtered_viirs['FRP'].values
    x = filtered_viirs.geometry.x.values
    y = filtered_viirs.geometry.y.values
    
    # First do the analysis without weighting by FRP, to verify the fire locations with the above plot
    heatmap, xedges, yedges = np.histogram2d(y, x, bins=list(combined_ds.MERRA2_CNN_Surface_PM25.shape[1:3]), range=[[y_min, y_max], [x_min, x_max]], weights=weights)

    # flip y axis to match the image (required debugging to see I needed to do this)
    heatmap = np.flipud(heatmap)
    
    # Perform convolution
    smoothed_heatmap = fftconvolve(heatmap, kernel, mode='same')
    
    # Set values lower than zero to zero
    smoothed_heatmap[smoothed_heatmap < 0] = 0.0

    return smoothed_heatmap
    
heatmap = create_fire_signal(periods[0][0], periods[0][1])
InĀ [56]:
# data structure for each of our our prediction pairs
from collections import namedtuple

RawData = namedtuple("RawData", "viirs merra2 predicted date")
samples = {}

for start, end in periods:
    date_str = datetime.strftime(end, "%Y-%m-%d")
    viirs = create_fire_signal(start, end)
    merra2 = combined_ds.sel(time=end).MERRA2_CNN_Surface_PM25.values
    samples[date_str] = RawData(viirs=viirs, merra2=merra2, predicted=None, date=date_str)

Run the convolution on each of the 2-week periods¶

  • 27 periods
  • each aligns with a PM2.5 sample date
InĀ [45]:
# Create empty variable the same size as the PM2.5 data so we can populate it
combined_ds["fire_signal"] = (("time", "y", "x"), np.full(combined_ds["MERRA2_CNN_Surface_PM25"].shape, np.nan))

# Add metadata
combined_ds["fire_signal"].attrs = {
    "units": "Energy?",
    "description": "Viirs fire observations in the previous 2 weeks, scaled by a Gaussian kernel of bandwidth {0} meters".format(bandwidth),
}

# Note: we could work out the units more precisely, each viirs point is 375m * 375m, we have the units of the FRP, and we know the size of the grid we're binning to and the kernel we're using. It should end up being energy per raster cell, or equivalently energy density.

# go through the time points, create the fire signal, and populate the binned_variable
for start, end in periods:
    date_str = datetime.strftime(end, "%Y-%m-%d")
    viirs = create_fire_signal(start, end)

    if np.datetime64(end) not in combined_ds['time']:
        raise ValueError(f"Date {date_str} not found in MERRA-2 dataset")

    combined_ds["fire_signal"].loc[dict(time=np.datetime64(end))] = viirs

combined_ds
Out[45]:
<xarray.Dataset> Size: 159MB
Dimensions:                  (time: 27, y: 350, x: 600)
Coordinates:
  * x                        (x) float64 5kB -1.497e+06 -1.492e+06 ... 1.498e+06
  * y                        (y) float64 3kB 9.959e+05 9.909e+05 ... -7.491e+05
  * time                     (time) datetime64[ns] 216B 2022-05-22T12:30:00 ....
    spatial_ref              int64 8B 0
Data variables:
    MERRA2_CNN_Surface_PM25  (time, y, x) float32 23MB 2.803 2.917 ... 10.38
    QFLAG                    (time, y, x) float64 45MB 2.0 2.0 2.0 ... 4.0 4.0
    fire_signal              (time, y, x) float64 45MB 2.954e-14 ... 2.236
    prediction               (time, y, x) float64 45MB nan nan nan ... nan nan
Attributes: (12/33)
    Comment:                           filename: MERRA2_HAQAST_CNN_L4_V1_2022...
    Filename:                          MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions:                       CF-1
    Institution:                       NASA Goddard Space Flight Center
    References:                        http://gmao.gsfc.nasa.gov ; https://do...
    Format:                            NetCDF-4/HDF-5
    ...                                ...
    RangeEndingDate:                   2022-05-22
    RangeEndingTime:                   23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension:    time
    history:                           2023-04-21 04:24:38 GMT Hyrax-1.16.3 h...
    ProcessingLevel:                   4
    MapProjection:                     Geographic lat/lon. Datum: WGS-84
xarray.Dataset
    • time: 27
    • y: 350
    • x: 600
    • x
      (x)
      float64
      -1.497e+06 -1.492e+06 ... 1.498e+06
      axis :
      X
      long_name :
      x coordinate of projection
      standard_name :
      projection_x_coordinate
      units :
      metre
      array([-1496524.326433, -1491524.326433, -1486524.326433, ...,  1488475.673567,
              1493475.673567,  1498475.673567])
    • y
      (y)
      float64
      9.959e+05 9.909e+05 ... -7.491e+05
      axis :
      Y
      long_name :
      y coordinate of projection
      standard_name :
      projection_y_coordinate
      units :
      metre
      array([ 995893.546327,  990893.546327,  985893.546327, ..., -739106.453673,
             -744106.453673, -749106.453673])
    • time
      (time)
      datetime64[ns]
      2022-05-22T12:30:00 ... 2022-10-...
      long_name :
      time
      array(['2022-05-22T12:30:00.000000000', '2022-05-08T12:30:00.000000000',
             '2022-05-01T12:30:00.000000000', '2022-05-15T12:30:00.000000000',
             '2022-05-29T12:30:00.000000000', '2022-06-05T12:30:00.000000000',
             '2022-06-12T12:30:00.000000000', '2022-06-19T12:30:00.000000000',
             '2022-06-26T12:30:00.000000000', '2022-07-03T12:30:00.000000000',
             '2022-07-10T12:30:00.000000000', '2022-07-17T12:30:00.000000000',
             '2022-07-24T12:30:00.000000000', '2022-07-31T12:30:00.000000000',
             '2022-08-07T12:30:00.000000000', '2022-08-14T12:30:00.000000000',
             '2022-08-21T12:30:00.000000000', '2022-08-28T12:30:00.000000000',
             '2022-09-04T12:30:00.000000000', '2022-09-11T12:30:00.000000000',
             '2022-09-18T12:30:00.000000000', '2022-09-25T12:30:00.000000000',
             '2022-10-02T12:30:00.000000000', '2022-10-09T12:30:00.000000000',
             '2022-10-16T12:30:00.000000000', '2022-10-23T12:30:00.000000000',
             '2022-10-30T12:30:00.000000000'], dtype='datetime64[ns]')
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      unknown
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      unknown
      grid_mapping_name :
      albers_conical_equal_area
      standard_parallel :
      (42.0, 48.0)
      latitude_of_projection_origin :
      45.0
      longitude_of_central_meridian :
      -115.0
      false_easting :
      0.0
      false_northing :
      0.0
      spatial_ref :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      GeoTransform :
      -2309024.326432758 5000.0 0.0 1388393.546326695 0.0 -5000.0
      array(0)
    • MERRA2_CNN_Surface_PM25
      (time, y, x)
      float32
      2.803 2.917 2.855 ... 10.31 10.38
      units :
      ug m-3
      long_name :
      Bias corrected surface total PM2.5 mass concentration
      array([[[2.80319929e+00, 2.91730952e+00, 2.85547781e+00, ...,
               1.24593735e-05, 9.45536813e-05, 2.72322621e-04],
              [2.87739038e+00, 2.98492169e+00, 2.88842416e+00, ...,
               1.42758108e-05, 5.31091937e-05, 2.49106961e-04],
              [2.96227264e+00, 3.01779151e+00, 2.92145991e+00, ...,
               1.13591559e-05, 5.70225609e-07, 2.14771106e-04],
              ...,
              [8.20281133e-02, 1.00274779e-01, 1.19042166e-01, ...,
               1.13938737e+00, 1.02916467e+00, 9.23335314e-01],
              [1.06056236e-01, 1.23177528e-01, 1.40818924e-01, ...,
               1.15250039e+00, 1.05170274e+00, 9.55293298e-01],
              [1.29588157e-01, 1.45585299e-01, 1.62101895e-01, ...,
               1.16146147e+00, 1.07007885e+00, 9.83079076e-01]],
      
             [[1.88523643e-02, 1.76367667e-02, 1.69822145e-02, ...,
               1.05465078e+00, 1.03993058e+00, 1.06375015e+00],
              [1.87412091e-02, 1.76020432e-02, 1.70893967e-02, ...,
               1.07569540e+00, 1.03132951e+00, 1.05612016e+00],
              [1.86296050e-02, 1.76992510e-02, 1.72118116e-02, ...,
               1.10119867e+00, 1.02213800e+00, 1.04789841e+00],
      ...
              [3.79566103e-01, 3.65422308e-01, 3.50830168e-01, ...,
               5.56936026e+00, 5.67352676e+00, 5.77631760e+00],
              [3.71755898e-01, 3.58579844e-01, 3.44955981e-01, ...,
               5.51105738e+00, 5.61222744e+00, 5.71202326e+00],
              [3.64373237e-01, 3.52163911e-01, 3.39507282e-01, ...,
               5.45407152e+00, 5.55224848e+00, 5.64905310e+00]],
      
             [[7.90751982e+00, 7.91771650e+00, 7.92481327e+00, ...,
               8.53381634e-01, 8.68257940e-01, 8.91305208e-01],
              [7.87675524e+00, 7.88557339e+00, 7.89231062e+00, ...,
               8.62468064e-01, 8.74697447e-01, 9.00063932e-01],
              [7.84532595e+00, 7.85294247e+00, 7.86002302e+00, ...,
               8.70708168e-01, 8.79723012e-01, 9.07405496e-01],
              ...,
              [3.86608690e-01, 3.52564454e-01, 3.20915490e-01, ...,
               1.04376144e+01, 1.05251646e+01, 1.06089659e+01],
              [4.20164615e-01, 3.81019294e-01, 3.44266385e-01, ...,
               1.03375454e+01, 1.04170561e+01, 1.04928226e+01],
              [4.51468110e-01, 4.07227188e-01, 3.65375817e-01, ...,
               1.02410116e+01, 1.03124924e+01, 1.03802328e+01]]], dtype=float32)
    • QFLAG
      (time, y, x)
      float64
      2.0 2.0 2.0 2.0 ... 4.0 4.0 4.0 4.0
      FLAG 1 :
      Low quality
      FLAG 2 :
      Medium quality over ocean
      FLAG 3 :
      High quality
      FLAG 4 :
      Highest quality
      Recommendation :
      qflag recommnendation -3 & 4 for quantitative analysis
      long_name :
      Quality Flag for data quality at grid
      units :
      unitless
      array([[[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
      ...
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]]])
    • fire_signal
      (time, y, x)
      float64
      2.954e-14 2.276e-14 ... 2.826 2.236
      units :
      Energy?
      description :
      Viirs fire observations in the previous 2 weeks, scaled by a Gaussian kernel of bandwidth 50000 meters
      array([[[2.95429829e-14, 2.27560814e-14, 3.35352779e-14, ...,
               3.67291139e-14, 4.15198679e-14, 4.95044578e-14],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.03414419e-14, 3.19383599e-14, 2.87445239e-14],
              ...,
              [9.58150797e-15, 1.19768850e-14, 0.00000000e+00, ...,
               2.79046415e+02, 2.37647088e+02, 2.00444895e+02],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               2.56672005e+02, 2.18585843e+02, 1.84363262e+02],
              [0.00000000e+00, 2.55506879e-14, 0.00000000e+00, ...,
               2.33729748e+02, 1.99048780e+02, 1.67887324e+02]],
      
             [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.81264171e-14, 3.75275729e-14, 3.71283434e-14],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               2.71975096e-15, 0.00000000e+00, 0.00000000e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               6.15811502e-14, 5.26982938e-14, 4.43144744e-14],
      ...
              [2.04405503e-13, 0.00000000e+00, 0.00000000e+00, ...,
               4.44902521e+00, 3.55724436e+00, 2.81502306e+00],
              [3.59306549e-15, 1.27354210e-13, 1.21166153e-13, ...,
               3.98583411e+00, 3.18697670e+00, 2.52207508e+00],
              [0.00000000e+00, 0.00000000e+00, 3.03414419e-14, ...,
               3.53421525e+00, 2.82594387e+00, 2.23642093e+00]],
      
             [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               4.08811007e-13, 3.06608255e-13, 3.57709631e-13],
              [5.42952118e-14, 7.66520638e-14, 6.38767198e-14, ...,
               3.06608255e-13, 4.59912383e-13, 4.08811007e-13],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               4.59912383e-13, 5.62115134e-13, 5.11013758e-13],
              ...,
              [1.75660979e-13, 0.00000000e+00, 1.66079471e-13, ...,
               4.44902521e+00, 3.55724436e+00, 2.81502306e+00],
              [3.99229499e-13, 0.00000000e+00, 0.00000000e+00, ...,
               3.98583411e+00, 3.18697670e+00, 2.52207508e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.53421525e+00, 2.82594387e+00, 2.23642093e+00]]])
    • prediction
      (time, y, x)
      float64
      nan nan nan nan ... nan nan nan nan
      units :
      PM2.5
      description :
      Prediction from simple linear regression of fire signal
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
      ...
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]])
    • x
      PandasIndex
      PandasIndex(Index([-1496524.326432758, -1491524.326432758, -1486524.326432758,
             -1481524.326432758, -1476524.326432758, -1471524.326432758,
             -1466524.326432758, -1461524.326432758, -1456524.326432758,
             -1451524.326432758,
             ...
              1453475.673567242,  1458475.673567242,  1463475.673567242,
              1468475.673567242,  1473475.673567242,  1478475.673567242,
              1483475.673567242,  1488475.673567242,  1493475.673567242,
              1498475.673567242],
            dtype='float64', name='x', length=600))
    • y
      PandasIndex
      PandasIndex(Index([ 995893.546326695,  990893.546326695,  985893.546326695,
              980893.546326695,  975893.546326695,  970893.546326695,
              965893.546326695,  960893.546326695,  955893.546326695,
              950893.546326695,
             ...
             -704106.453673305, -709106.453673305, -714106.453673305,
             -719106.453673305, -724106.453673305, -729106.453673305,
             -734106.453673305, -739106.453673305, -744106.453673305,
             -749106.453673305],
            dtype='float64', name='y', length=350))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2022-05-22 12:30:00', '2022-05-08 12:30:00',
                     '2022-05-01 12:30:00', '2022-05-15 12:30:00',
                     '2022-05-29 12:30:00', '2022-06-05 12:30:00',
                     '2022-06-12 12:30:00', '2022-06-19 12:30:00',
                     '2022-06-26 12:30:00', '2022-07-03 12:30:00',
                     '2022-07-10 12:30:00', '2022-07-17 12:30:00',
                     '2022-07-24 12:30:00', '2022-07-31 12:30:00',
                     '2022-08-07 12:30:00', '2022-08-14 12:30:00',
                     '2022-08-21 12:30:00', '2022-08-28 12:30:00',
                     '2022-09-04 12:30:00', '2022-09-11 12:30:00',
                     '2022-09-18 12:30:00', '2022-09-25 12:30:00',
                     '2022-10-02 12:30:00', '2022-10-09 12:30:00',
                     '2022-10-16 12:30:00', '2022-10-23 12:30:00',
                     '2022-10-30 12:30:00'],
                    dtype='datetime64[ns]', name='time', freq=None))
  • Comment :
    filename: MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Filename :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions :
    CF-1
    Institution :
    NASA Goddard Space Flight Center
    References :
    http://gmao.gsfc.nasa.gov ; https://doi.org/10.4209/aaqr.210105 ; https://doi.org/10.1029/2022EA002375
    Format :
    NetCDF-4/HDF-5
    SpatialCoverage :
    global
    VersionID :
    V1
    TemporalRange :
    2000-01-01 -> 2024-12-31
    identifier_product_doi_authority :
    http://dx.doi.org/
    ShortName :
    MERRA2_CNN_HAQAST_PM25
    GranuleID :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    ProductionDateTime :
    2024-06-18T06:44:33.646033Z
    LongName :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    title :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    SouthernmostLatitude :
    -90.0
    NorthernmostLatitude :
    90.0
    WesternmostLongitude :
    -180.0
    EasternmostLongitude :
    179.375
    LatitudeResolution :
    0.5
    LongitudeResolution :
    0.625
    DataResolution :
    0.5 x 0.625
    Source :
    M2T1NXAER_5.12.4: BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    Contact :
    Pawan Gupta (pawan.gupta@nasa.gov), and Alqamah Sayeed (alqamah.sayeed@nasa.gov)
    identifier_product_doi :
    10.5067/OCKK5HCFW5N3
    RangeBeginningDate :
    2022-05-22
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2022-05-22
    RangeEndingTime :
    23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension :
    time
    history :
    2023-04-21 04:24:38 GMT Hyrax-1.16.3 https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXAER.5.12.4/2022/12/MERRA2_400.tavg1_2d_aer_Nx.20221231.nc4.nc4?BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    ProcessingLevel :
    4
    MapProjection :
    Geographic lat/lon. Datum: WGS-84

Linear Regression¶

In our simple model, we'll predict PM2.5 at each grid point based solely on the VIIRS value at that grid point (independence assumption).¶

  • note, no recognition of spatial effects

from https://towardsdatascience.com/building-linear-regression-least-squares-with-linear-algebra-2adf071dd5dd

First, linearize the data¶

  • $5 \cdot 10^6$ points!
InĀ [21]:
# first, linearize the data
X = combined_ds["fire_signal"].values.flatten()
y = combined_ds["MERRA2_CNN_Surface_PM25"].values.flatten()
InĀ [22]:
print(X.shape)
print(y.shape)
(5670000,)
(5670000,)
InĀ [23]:
# convert to a 1d vector for linear regression
X = np.reshape(X, (len(X), 1))
y = np.reshape(y, (len(y), 1))
InĀ [24]:
print(X.shape)
print(y.shape)
(5670000, 1)
(5670000, 1)

Execute linear regression with scikit-learn¶

InĀ [25]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
InĀ [26]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=5)

time the execution (on $5*10^6$ points!), just for fun..

InĀ [27]:
import time
start_time = time.time()

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)


end_time = time.time()
elapsed_time = end_time - start_time
print(f"Execution time: {elapsed_time:.2f} seconds")
Execution time: 0.06 seconds

How does the model perform?¶

InĀ [28]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}  (min|max|mean = {y.min()} | {y.max()} | {y.mean()})")
print(f"R² Score: {r2}")
Mean Squared Error: 29.41396587869877  (min|max|mean = 0.0 | 197.2798309326172 | 3.2790627479553223)
R² Score: 0.07299203797362785
InĀ [29]:
fig, ax = plt.subplots(figsize=(8,6))

ax.scatter(y_test, y_pred, color='blue', s=2)

x_line = np.linspace(0,100,100)
y_line = np.linspace(0,100,100)

ax.plot(x_line, y_line, color="black", linestyle="-", label="Identity Line (Predicted==Actual)")

ax.set_xlim(0, 200)
ax.set_ylim(0, 100)

ax.set_xlabel("Actual PM2.5 (ug/m-3)")
ax.set_ylabel("Predicted PM2.5 (ug/m-3)")

ax.legend()
Out[29]:
<matplotlib.legend.Legend at 0x7f531d29fed0>
No description has been provided for this image

What does this mean?¶

It's not very good:

  • Mean Squared Error: This indicates that the average squared difference between the predicted value and the actual value is about 9x the mean value.

  • $R^2$: An $R^2$ of 0.0869 means that about 9% of the dependent variable (PM2.5 levels) is predictable from the independent variable (VIIRS fire intensity data).

  • There was a floor to the predictions of about 3 uq/m-3

  • The "peaky" shapes in the regression scatterplot are interesting, and possibly suggest unexplored spatial relationships in the data, or single time-points where we have spatially correlated error

  • Aside - this data plot makes it very hard to see how closely related the data is as we have 5 million points that can be very close together. An alternative is a 2d histogram. However the R^2 number is already quite informative on the quality of the model

InĀ [31]:
# create a 2d hist plot with a log scale on the bins
fig, ax = plt.subplots(figsize=(8,6))

hb = ax.hexbin(y_test, y_pred, gridsize=50, cmap='inferno', bins='log')

ax.set_xlabel('VIIRS FRP')
ax.set_ylabel('MERRA2 PM2.5')
ax.set_title('VIIRS Fire data 2DHist')
fig.colorbar(hb, ax=ax)

ax.plot(x_line, y_line, color="blue", linestyle="-", linewidth=2)

ax.set_xlim(0, 100)
ax.set_ylim(0, 100)
plt.show()
No description has been provided for this image

Plot of relationship between fire feature and PM2.5 over the test data set¶

InĀ [32]:
x_line = np.linspace(0,200000,100)
y_line = x_line * model.coef_[0] + model.intercept_

# create a 2d hist plot with a log scale on the bins
fig, ax = plt.subplots(figsize=(8,6))


hb = ax.hexbin(X_test.flatten(), y_test.flatten(), gridsize=50, cmap='inferno', bins='log')

ax.set_xlabel('VIIRS FRP')
ax.set_ylabel('MERRA2 PM2.5')
ax.set_title('VIIRS Fire data 2DHist')
fig.colorbar(hb, ax=ax)

ax.plot(x_line, y_line, color="blue", linestyle="-", linewidth=2)

ax.set_xlim(0, 100000)
ax.set_ylim(0, 100)
plt.show()
No description has been provided for this image

Displaying the Prediction Results¶

We want to show each of the predictions that our model made and compare them to the actual values.¶

InĀ [33]:
# First, collect the model predictions for all 27 samples
y_full_pred = model.predict(X)    # technically, a concatenation of y_train and y_test

However, we linearized our data for regression...¶

We need to reshape it for mapping.¶

InĀ [46]:
# reshape the predictions and match with dates

#chunk = 350 * 600  # the number of pixels in a single image
#predictions = {date: y_full_pred[(chunk * i):(chunk*(i+1))].reshape(350,600) for i, date in enumerate(samples)}

# Create empty variable the same size as the PM2.5 data so we can populate it
combined_ds["prediction"] = (("time", "y", "x"), np.full(combined_ds["MERRA2_CNN_Surface_PM25"].shape, np.nan))

# Add metadata
combined_ds["prediction"].attrs = {
    "units": "PM2.5",
    "description": "Prediction from simple linear regression of fire signal".format(bandwidth),
}

# go through the time points, create the fire signal, and populate the binned_variable
for start, end in periods:
    if np.datetime64(end) not in combined_ds['time']:
        raise ValueError(f"Date {end} not found in MERRA-2 dataset")

    pred = combined_ds["MERRA2_CNN_Surface_PM25"].loc[dict(time=np.datetime64(end))].values * model.coef_[0] + model.intercept_
    combined_ds["prediction"].loc[dict(time=np.datetime64(end))] = pred

combined_ds
Out[46]:
<xarray.Dataset> Size: 159MB
Dimensions:                  (time: 27, y: 350, x: 600)
Coordinates:
  * x                        (x) float64 5kB -1.497e+06 -1.492e+06 ... 1.498e+06
  * y                        (y) float64 3kB 9.959e+05 9.909e+05 ... -7.491e+05
  * time                     (time) datetime64[ns] 216B 2022-05-22T12:30:00 ....
    spatial_ref              int64 8B 0
Data variables:
    MERRA2_CNN_Surface_PM25  (time, y, x) float32 23MB 2.803 2.917 ... 10.38
    QFLAG                    (time, y, x) float64 45MB 2.0 2.0 2.0 ... 4.0 4.0
    fire_signal              (time, y, x) float64 45MB 2.954e-14 ... 2.236
    prediction               (time, y, x) float64 45MB 3.09 3.09 ... 3.094 3.094
Attributes: (12/33)
    Comment:                           filename: MERRA2_HAQAST_CNN_L4_V1_2022...
    Filename:                          MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions:                       CF-1
    Institution:                       NASA Goddard Space Flight Center
    References:                        http://gmao.gsfc.nasa.gov ; https://do...
    Format:                            NetCDF-4/HDF-5
    ...                                ...
    RangeEndingDate:                   2022-05-22
    RangeEndingTime:                   23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension:    time
    history:                           2023-04-21 04:24:38 GMT Hyrax-1.16.3 h...
    ProcessingLevel:                   4
    MapProjection:                     Geographic lat/lon. Datum: WGS-84
xarray.Dataset
    • time: 27
    • y: 350
    • x: 600
    • x
      (x)
      float64
      -1.497e+06 -1.492e+06 ... 1.498e+06
      axis :
      X
      long_name :
      x coordinate of projection
      standard_name :
      projection_x_coordinate
      units :
      metre
      array([-1496524.326433, -1491524.326433, -1486524.326433, ...,  1488475.673567,
              1493475.673567,  1498475.673567])
    • y
      (y)
      float64
      9.959e+05 9.909e+05 ... -7.491e+05
      axis :
      Y
      long_name :
      y coordinate of projection
      standard_name :
      projection_y_coordinate
      units :
      metre
      array([ 995893.546327,  990893.546327,  985893.546327, ..., -739106.453673,
             -744106.453673, -749106.453673])
    • time
      (time)
      datetime64[ns]
      2022-05-22T12:30:00 ... 2022-10-...
      long_name :
      time
      array(['2022-05-22T12:30:00.000000000', '2022-05-08T12:30:00.000000000',
             '2022-05-01T12:30:00.000000000', '2022-05-15T12:30:00.000000000',
             '2022-05-29T12:30:00.000000000', '2022-06-05T12:30:00.000000000',
             '2022-06-12T12:30:00.000000000', '2022-06-19T12:30:00.000000000',
             '2022-06-26T12:30:00.000000000', '2022-07-03T12:30:00.000000000',
             '2022-07-10T12:30:00.000000000', '2022-07-17T12:30:00.000000000',
             '2022-07-24T12:30:00.000000000', '2022-07-31T12:30:00.000000000',
             '2022-08-07T12:30:00.000000000', '2022-08-14T12:30:00.000000000',
             '2022-08-21T12:30:00.000000000', '2022-08-28T12:30:00.000000000',
             '2022-09-04T12:30:00.000000000', '2022-09-11T12:30:00.000000000',
             '2022-09-18T12:30:00.000000000', '2022-09-25T12:30:00.000000000',
             '2022-10-02T12:30:00.000000000', '2022-10-09T12:30:00.000000000',
             '2022-10-16T12:30:00.000000000', '2022-10-23T12:30:00.000000000',
             '2022-10-30T12:30:00.000000000'], dtype='datetime64[ns]')
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      unknown
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      unknown
      grid_mapping_name :
      albers_conical_equal_area
      standard_parallel :
      (42.0, 48.0)
      latitude_of_projection_origin :
      45.0
      longitude_of_central_meridian :
      -115.0
      false_easting :
      0.0
      false_northing :
      0.0
      spatial_ref :
      PROJCS["unknown",GEOGCS["unknown",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",45],PARAMETER["longitude_of_center",-115],PARAMETER["standard_parallel_1",42],PARAMETER["standard_parallel_2",48],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
      GeoTransform :
      -2309024.326432758 5000.0 0.0 1388393.546326695 0.0 -5000.0
      array(0)
    • MERRA2_CNN_Surface_PM25
      (time, y, x)
      float32
      2.803 2.917 2.855 ... 10.31 10.38
      units :
      ug m-3
      long_name :
      Bias corrected surface total PM2.5 mass concentration
      array([[[2.80319929e+00, 2.91730952e+00, 2.85547781e+00, ...,
               1.24593735e-05, 9.45536813e-05, 2.72322621e-04],
              [2.87739038e+00, 2.98492169e+00, 2.88842416e+00, ...,
               1.42758108e-05, 5.31091937e-05, 2.49106961e-04],
              [2.96227264e+00, 3.01779151e+00, 2.92145991e+00, ...,
               1.13591559e-05, 5.70225609e-07, 2.14771106e-04],
              ...,
              [8.20281133e-02, 1.00274779e-01, 1.19042166e-01, ...,
               1.13938737e+00, 1.02916467e+00, 9.23335314e-01],
              [1.06056236e-01, 1.23177528e-01, 1.40818924e-01, ...,
               1.15250039e+00, 1.05170274e+00, 9.55293298e-01],
              [1.29588157e-01, 1.45585299e-01, 1.62101895e-01, ...,
               1.16146147e+00, 1.07007885e+00, 9.83079076e-01]],
      
             [[1.88523643e-02, 1.76367667e-02, 1.69822145e-02, ...,
               1.05465078e+00, 1.03993058e+00, 1.06375015e+00],
              [1.87412091e-02, 1.76020432e-02, 1.70893967e-02, ...,
               1.07569540e+00, 1.03132951e+00, 1.05612016e+00],
              [1.86296050e-02, 1.76992510e-02, 1.72118116e-02, ...,
               1.10119867e+00, 1.02213800e+00, 1.04789841e+00],
      ...
              [3.79566103e-01, 3.65422308e-01, 3.50830168e-01, ...,
               5.56936026e+00, 5.67352676e+00, 5.77631760e+00],
              [3.71755898e-01, 3.58579844e-01, 3.44955981e-01, ...,
               5.51105738e+00, 5.61222744e+00, 5.71202326e+00],
              [3.64373237e-01, 3.52163911e-01, 3.39507282e-01, ...,
               5.45407152e+00, 5.55224848e+00, 5.64905310e+00]],
      
             [[7.90751982e+00, 7.91771650e+00, 7.92481327e+00, ...,
               8.53381634e-01, 8.68257940e-01, 8.91305208e-01],
              [7.87675524e+00, 7.88557339e+00, 7.89231062e+00, ...,
               8.62468064e-01, 8.74697447e-01, 9.00063932e-01],
              [7.84532595e+00, 7.85294247e+00, 7.86002302e+00, ...,
               8.70708168e-01, 8.79723012e-01, 9.07405496e-01],
              ...,
              [3.86608690e-01, 3.52564454e-01, 3.20915490e-01, ...,
               1.04376144e+01, 1.05251646e+01, 1.06089659e+01],
              [4.20164615e-01, 3.81019294e-01, 3.44266385e-01, ...,
               1.03375454e+01, 1.04170561e+01, 1.04928226e+01],
              [4.51468110e-01, 4.07227188e-01, 3.65375817e-01, ...,
               1.02410116e+01, 1.03124924e+01, 1.03802328e+01]]], dtype=float32)
    • QFLAG
      (time, y, x)
      float64
      2.0 2.0 2.0 2.0 ... 4.0 4.0 4.0 4.0
      FLAG 1 :
      Low quality
      FLAG 2 :
      Medium quality over ocean
      FLAG 3 :
      High quality
      FLAG 4 :
      Highest quality
      Recommendation :
      qflag recommnendation -3 & 4 for quantitative analysis
      long_name :
      Quality Flag for data quality at grid
      units :
      unitless
      array([[[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
      ...
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]],
      
             [[2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              [2., 2., 2., ..., 3., 3., 3.],
              ...,
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.],
              [2., 2., 2., ..., 4., 4., 4.]]])
    • fire_signal
      (time, y, x)
      float64
      2.954e-14 2.276e-14 ... 2.826 2.236
      units :
      Energy?
      description :
      Viirs fire observations in the previous 2 weeks, scaled by a Gaussian kernel of bandwidth 50000 meters
      array([[[2.95429829e-14, 2.27560814e-14, 3.35352779e-14, ...,
               3.67291139e-14, 4.15198679e-14, 4.95044578e-14],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.03414419e-14, 3.19383599e-14, 2.87445239e-14],
              ...,
              [9.58150797e-15, 1.19768850e-14, 0.00000000e+00, ...,
               2.79046415e+02, 2.37647088e+02, 2.00444895e+02],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               2.56672005e+02, 2.18585843e+02, 1.84363262e+02],
              [0.00000000e+00, 2.55506879e-14, 0.00000000e+00, ...,
               2.33729748e+02, 1.99048780e+02, 1.67887324e+02]],
      
             [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.81264171e-14, 3.75275729e-14, 3.71283434e-14],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               2.71975096e-15, 0.00000000e+00, 0.00000000e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               6.15811502e-14, 5.26982938e-14, 4.43144744e-14],
      ...
              [2.04405503e-13, 0.00000000e+00, 0.00000000e+00, ...,
               4.44902521e+00, 3.55724436e+00, 2.81502306e+00],
              [3.59306549e-15, 1.27354210e-13, 1.21166153e-13, ...,
               3.98583411e+00, 3.18697670e+00, 2.52207508e+00],
              [0.00000000e+00, 0.00000000e+00, 3.03414419e-14, ...,
               3.53421525e+00, 2.82594387e+00, 2.23642093e+00]],
      
             [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               4.08811007e-13, 3.06608255e-13, 3.57709631e-13],
              [5.42952118e-14, 7.66520638e-14, 6.38767198e-14, ...,
               3.06608255e-13, 4.59912383e-13, 4.08811007e-13],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               4.59912383e-13, 5.62115134e-13, 5.11013758e-13],
              ...,
              [1.75660979e-13, 0.00000000e+00, 1.66079471e-13, ...,
               4.44902521e+00, 3.55724436e+00, 2.81502306e+00],
              [3.99229499e-13, 0.00000000e+00, 0.00000000e+00, ...,
               3.98583411e+00, 3.18697670e+00, 2.52207508e+00],
              [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
               3.53421525e+00, 2.82594387e+00, 2.23642093e+00]]])
    • prediction
      (time, y, x)
      float64
      3.09 3.09 3.09 ... 3.094 3.094
      units :
      PM2.5
      description :
      Prediction from simple linear regression of fire signal
      array([[[3.08959066, 3.08965537, 3.0896203 , ..., 3.08800101,
               3.08800106, 3.08800116],
              [3.08963273, 3.08969371, 3.08963899, ..., 3.08800102,
               3.08800104, 3.08800115],
              [3.08968086, 3.08971235, 3.08965772, ..., 3.08800101,
               3.08800101, 3.08800113],
              ...,
              [3.08804752, 3.08805787, 3.08806851, ..., 3.08864714,
               3.08858463, 3.08852462],
              [3.08806115, 3.08807086, 3.08808086, ..., 3.08865457,
               3.08859741, 3.08854274],
              [3.08807449, 3.08808357, 3.08809293, ..., 3.08865965,
               3.08860783, 3.0885585 ]],
      
             [[3.0880117 , 3.08801101, 3.08801064, ..., 3.08859908,
               3.08859074, 3.08860424],
              [3.08801164, 3.08801099, 3.0880107 , ..., 3.08861102,
               3.08858586, 3.08859992],
              [3.08801157, 3.08801104, 3.08801077, ..., 3.08862548,
               3.08858065, 3.08859525],
      ...
              [3.08821625, 3.08820823, 3.08819996, ..., 3.0911593 ,
               3.09121837, 3.09127666],
              [3.08821182, 3.08820435, 3.08819663, ..., 3.09112624,
               3.09118361, 3.0912402 ],
              [3.08820764, 3.08820071, 3.08819354, ..., 3.09109392,
               3.0911496 , 3.09120449]],
      
             [[3.09248524, 3.09249102, 3.09249504, ..., 3.08848495,
               3.08849338, 3.08850645],
              [3.09246779, 3.09247279, 3.09247661, ..., 3.0884901 ,
               3.08849703, 3.08851142],
              [3.09244997, 3.09245429, 3.0924583 , ..., 3.08849477,
               3.08849988, 3.08851558],
              ...,
              [3.08822025, 3.08820094, 3.08818299, ..., 3.09392001,
               3.09396966, 3.09401718],
              [3.08823928, 3.08821708, 3.08819624, ..., 3.09386326,
               3.09390835, 3.09395132],
              [3.08825703, 3.08823194, 3.08820821, ..., 3.09380852,
               3.09384906, 3.09388747]]])
    • x
      PandasIndex
      PandasIndex(Index([-1496524.326432758, -1491524.326432758, -1486524.326432758,
             -1481524.326432758, -1476524.326432758, -1471524.326432758,
             -1466524.326432758, -1461524.326432758, -1456524.326432758,
             -1451524.326432758,
             ...
              1453475.673567242,  1458475.673567242,  1463475.673567242,
              1468475.673567242,  1473475.673567242,  1478475.673567242,
              1483475.673567242,  1488475.673567242,  1493475.673567242,
              1498475.673567242],
            dtype='float64', name='x', length=600))
    • y
      PandasIndex
      PandasIndex(Index([ 995893.546326695,  990893.546326695,  985893.546326695,
              980893.546326695,  975893.546326695,  970893.546326695,
              965893.546326695,  960893.546326695,  955893.546326695,
              950893.546326695,
             ...
             -704106.453673305, -709106.453673305, -714106.453673305,
             -719106.453673305, -724106.453673305, -729106.453673305,
             -734106.453673305, -739106.453673305, -744106.453673305,
             -749106.453673305],
            dtype='float64', name='y', length=350))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2022-05-22 12:30:00', '2022-05-08 12:30:00',
                     '2022-05-01 12:30:00', '2022-05-15 12:30:00',
                     '2022-05-29 12:30:00', '2022-06-05 12:30:00',
                     '2022-06-12 12:30:00', '2022-06-19 12:30:00',
                     '2022-06-26 12:30:00', '2022-07-03 12:30:00',
                     '2022-07-10 12:30:00', '2022-07-17 12:30:00',
                     '2022-07-24 12:30:00', '2022-07-31 12:30:00',
                     '2022-08-07 12:30:00', '2022-08-14 12:30:00',
                     '2022-08-21 12:30:00', '2022-08-28 12:30:00',
                     '2022-09-04 12:30:00', '2022-09-11 12:30:00',
                     '2022-09-18 12:30:00', '2022-09-25 12:30:00',
                     '2022-10-02 12:30:00', '2022-10-09 12:30:00',
                     '2022-10-16 12:30:00', '2022-10-23 12:30:00',
                     '2022-10-30 12:30:00'],
                    dtype='datetime64[ns]', name='time', freq=None))
  • Comment :
    filename: MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Filename :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    Conventions :
    CF-1
    Institution :
    NASA Goddard Space Flight Center
    References :
    http://gmao.gsfc.nasa.gov ; https://doi.org/10.4209/aaqr.210105 ; https://doi.org/10.1029/2022EA002375
    Format :
    NetCDF-4/HDF-5
    SpatialCoverage :
    global
    VersionID :
    V1
    TemporalRange :
    2000-01-01 -> 2024-12-31
    identifier_product_doi_authority :
    http://dx.doi.org/
    ShortName :
    MERRA2_CNN_HAQAST_PM25
    GranuleID :
    MERRA2_HAQAST_CNN_L4_V1_20220522.nc4
    ProductionDateTime :
    2024-06-18T06:44:33.646033Z
    LongName :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    title :
    MERRA2_CNN_HAQAST bias corrected global hourly surface total PM2.5 mass concentration V1.0
    SouthernmostLatitude :
    -90.0
    NorthernmostLatitude :
    90.0
    WesternmostLongitude :
    -180.0
    EasternmostLongitude :
    179.375
    LatitudeResolution :
    0.5
    LongitudeResolution :
    0.625
    DataResolution :
    0.5 x 0.625
    Source :
    M2T1NXAER_5.12.4: BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    Contact :
    Pawan Gupta (pawan.gupta@nasa.gov), and Alqamah Sayeed (alqamah.sayeed@nasa.gov)
    identifier_product_doi :
    10.5067/OCKK5HCFW5N3
    RangeBeginningDate :
    2022-05-22
    RangeBeginningTime :
    00:00:00.000000
    RangeEndingDate :
    2022-05-22
    RangeEndingTime :
    23:59:59.000000
    DODS_EXTRA.Unlimited_Dimension :
    time
    history :
    2023-04-21 04:24:38 GMT Hyrax-1.16.3 https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXAER.5.12.4/2022/12/MERRA2_400.tavg1_2d_aer_Nx.20221231.nc4.nc4?BCSMASS,DUSMASS25,OCSMASS,SO2SMASS,SO4SMASS,SSSMASS25,TOTEXTTAU
    ProcessingLevel :
    4
    MapProjection :
    Geographic lat/lon. Datum: WGS-84
InĀ [51]:
# setup the plot
num_samples = len(combined_ds['time'].values)
print(num_samples)

ax1 = list(range(num_samples))
ax2 = list(range(num_samples))
ax3 = list(range(num_samples))
27
InĀ [138]:
# sort the dates
dates_sorted = sorted(list(samples.keys()))
InĀ [401]:
crs_proj4 = ccrs.AlbersEqualArea(
    central_longitude=-115, 
    central_latitude=45,    
    standard_parallels=(42, 48)
)

# Set up figure dimensions
aspect_ratio = (y_max - y_min) / (x_max - x_min)
fig = plt.figure(figsize=(25*aspect_ratio, 25*3))  # reduces distortion


for i in range(num_samples):

    prediction_date = combined_ds['time'].values[i]
    viirs_start = datetime.strftime(datetime.strptime(prediction_date, '%Y-%m-%d') - timedelta(weeks=2), '%Y-%m-%d')

    ax1[i] = fig.add_subplot(num_samples, 3, 1+3*i, projection=crs_proj4)
    ax1[i].set_extent([x_min, x_max, y_min, y_max], crs=crs_proj4) # CONUS extent
    ax1[i].coastlines(resolution='50m')
    ax1[i].add_feature(cfeature.BORDERS)
    ax1[i].add_feature(cfeature.STATES)
    
    ax1[i].set_yticklabels("fsfa")

    if i == 0:
        ax1[i].set_title(f"Input (Cumulative Fire Radiative Power)", fontsize=14, fontweight='bold', y=1.08)
    ax1[i].text(x=0.5, y = 1.02, s = f"{viirs_start} to {prediction_date}", transform=ax1[i].transAxes, 
            ha='center', va='bottom', fontsize=10)
    
    # Set up filled and line contours
    # 10 levels from 0 to max
    max_val = samples[prediction_date].viirs.max()
    levels = np.linspace(0, max_val, 11)

    filled_c1 = ax1[i].contourf(combined_ds.x, combined_ds.y, samples[prediction_date].viirs, levels=levels, 
                        transform=crs_proj4)
    ax1[i].set_ylabel(f"{dates_sorted[i]}", fontsize=10, labelpad=10)
    # plot the MERRA2 data

    #heatmap_merra2 = combined_ds.sel(time=periods[0][1]).MERRA2_CNN_Surface_PM25.values

    # Lat/lon grid lines
    #ax.gridlines(draw_labels=False, dms=True, x_inline=False, y_inline=False)

    ax2[i] = fig.add_subplot(num_samples, 3, 2+3*i,  projection=crs_proj4)
    ax2[i].set_extent([x_min, x_max, y_min, y_max], crs=crs_proj4) # CONUS extent
    ax2[i].coastlines(resolution='50m')
    ax2[i].add_feature(cfeature.BORDERS)
    ax2[i].add_feature(cfeature.STATES)

    # Set up filled and line contours
    # 10 levels from 0 to max
    max_val = samples[prediction_date].merra2.max()
    levels = np.linspace(0, max_val, 11)

    filled_c2 = ax2[i].contourf(combined_ds.x, combined_ds.y, samples[prediction_date].merra2, levels=levels, 
                        transform=crs_proj4)

    # Lat/lon grid lines
    #ax2.gridlines(draw_labels=False, dms=True, x_inline=False, y_inline=False)
    if i== 0:
        ax2[i].set_title(f"Actual PM2.5 Levels (MERRA-2)", fontsize=14, fontweight='bold', y=1.08)
    ax2[i].text(x=0.5, y = 1.02, s = f"{prediction_date}, 12:30 PM", transform=ax2[i].transAxes, 
            ha='center', va='bottom', fontsize=10)
    
    # get the predicted data

    ax3[i] = fig.add_subplot(num_samples, 3, 3+3*i,  projection=crs_proj4)
    ax3[i].set_extent([x_min, x_max, y_min, y_max], crs=crs_proj4) # CONUS extent
    ax3[i].coastlines(resolution='50m')
    ax3[i].add_feature(cfeature.BORDERS)
    ax3[i].add_feature(cfeature.STATES)

    # plot with same scale as ax2 (for comparison)

    filled_c3 = ax3[i].contourf(combined_ds.x, combined_ds.y, predictions[prediction_date], levels=levels, 
                        transform=crs_proj4)

    # Lat/lon grid lines
    #ax2.gridlines(draw_labels=False, dms=True, x_inline=False, y_inline=False)
    if i== 0:
        ax3[i].set_title(f"Predicted PM2.5 Levels", fontsize=14, fontweight='bold', y=1.08)
    ax3[i].text(x=0.5, y = 1.02, s = f"{prediction_date}, 12:30 PM", transform=ax3[i].transAxes, 
            ha='center', va='bottom', fontsize=10)
    
    ax1[i].set_aspect("auto")
    ax2[i].set_aspect("auto")
    ax3[i].set_aspect("auto")
    
    fig.text(
        x= 0,  # X-position (slightly left of the subplots)
        y= 1 - (i + 0.5) / num_samples,  
        fontsize=12,  
        fontweight='bold',  
        ha='center',  #
        va='center', 
        rotation=90  
    )
    

#plt.subplots_adjust(hspace=0, wspace=0)  
#plt.subplots_adjust(left=0.5)
#plt.figure(constrained_layout=True)
plt.tight_layout(pad=1)

plt.show()
No description has been provided for this image

How can we improve the coloring of these plots?¶

  • currently, the scale for each plot is based on the range of its own data (split into 10 equal-sized bins)
  • however, the range of value varies considerably
  • would it look better if we used just two scales - one each for each kind of data (Fire Intensity, PM2.5)
InĀ [139]:
fig, ax = plt.subplots(1, 3, figsize=(10,3))
ax[0].hist(X.ravel(), bins=30, density=True, alpha=0.6, color='blue', edgecolor='black')
ax[0].set_title("VIIRS Data (X)")
ax[1].hist(y.ravel(), bins=30, density=True, alpha=0.6, color='blue', edgecolor='black')
ax[1].set_title("MERRA2 Data (y)")
ax[2].hist(y_full_pred.ravel(), bins=30, density=True, alpha=0.6, color='blue', edgecolor='black')
ax[2].set_title("Predictions (y-full_pred)")

fig.suptitle("Distribution of Values for VIIRS, MERRA2, and Predictions", y=1.05)
Out[139]:
Text(0.5, 1.05, 'Distribution of Values for VIIRS, MERRA2, and Predictions')
No description has been provided for this image

The distributions for each appear to have extremely long tails...¶

InĀ [141]:
print(f"X: {np.percentile(X.flatten(), [25, 50, 75, 95, 99, 100])}")
print(f"y: {np.percentile(y.flatten(), [25, 50, 75, 95, 99, 100])}")
print(f"y_full_pred: {np.percentile(y_full_pred.flatten(), [25, 50, 75, 95, 99, 100])}")
X: [5.85508330e-03 5.44194043e+00 4.31719635e+01 8.28093855e+02
 6.54301678e+03 1.55971633e+05]
y: [  0.69570203   1.83945251   4.00027323   9.62974434  27.52798326
 197.27983093]
y_full_pred: [ 3.07410912  3.07746838  3.10078387  3.58583099  7.11740139 99.45769825]

The "long tails" are confirmed above:¶

  • For X, there is a $10^2$ difference between 99th and 100th percentile
  • For y and y_pred_full, there is a $10^1$ difference between 99th and 100th percentile

Let's adjust the upper-range of our visualization (trial and error)¶

InĀ [142]:
viirs_max = np.percentile(X.flatten(), [99.5])[0]
pm25_max = np.percentile(y.flatten(), [99.5])[0]
InĀ [190]:
# Set up figure 
aspect_ratio = (y_max - y_min) / (x_max - x_min)
fig = plt.figure(figsize=(35*aspect_ratio, 35*3*1.15))   # extend width because of extra height from 1st two rows

levels_viirs = np.linspace(0, viirs_max, 11)
levels_pm25 = np.linspace(0, pm25_max, 11)

# num rows = samples + 2 (1st two rows for colorbar and spacing)
gs = gridspec.GridSpec(num_samples + 2, 3, height_ratios=[.08] + [0.02] + [1] * num_samples, hspace=0.2, wspace=0)

crs_proj4 = ccrs.AlbersEqualArea(central_longitude=-115, central_latitude=45, standard_parallels=(42, 48))

# colorbar in first row
for col in range(3):
    ax_cb = fig.add_subplot(gs[0, col])
    
    levels = levels_viirs if col == 0 else levels_pm25
    cmap = plt.cm.plasma if col == 0 else plt.cm.viridis
    norm = plt.Normalize(vmin=levels[0], vmax=levels[-1])
    #cbar.set_label(title, fontsize=10, fontweight='bold')
    cbar = fig.colorbar(plt.cm.ScalarMappable(norm=norm, cmap=cmap), cax=ax_cb, orientation='horizontal')
    
    box = ax_cb.get_position()
    cbar_width_ratio = 0.85  # adjust width of colorbars
    ax_cb.set_position([
        box.x0 + (1 - cbar_width_ratio) / 2 * box.width,  # center horiz
        box.y0,
        cbar_width_ratio * box.width,  # width
        box.height
    ])

# subplots for the data
for i in range(num_samples):

    prediction_date = dates_sorted[i]
    viirs_start = datetime.strftime(datetime.strptime(prediction_date, '%Y-%m-%d') - timedelta(weeks=2), '%Y-%m-%d')

    for col, dataset, levels, cmap, title in zip(
        range(3), 
        ["viirs", "merra2", predictions[prediction_date]], 
        [levels_viirs, levels_pm25, levels_pm25], 
        [plt.cm.plasma, plt.cm.viridis, plt.cm.viridis],
        ["Input (Fire Radiative Power) (W/cm^2)", "Actual PM2.5 Levels (g/um^2)", "Predicted PM2.5 Levels (g/um^2)"]
    ):
        ax = fig.add_subplot(gs[i + 2, col], projection=crs_proj4)
        ax.set_extent([x_min, x_max, y_min, y_max], crs=crs_proj4)
        ax.coastlines(resolution='50m')
        ax.add_feature(cfeature.BORDERS)
        ax.add_feature(cfeature.STATES)
        
        # Plot data
        data = getattr(samples[prediction_date], dataset) if col < 2 else dataset
        filled = ax.contourf(combined_ds['x'], combined_ds['y'], data, levels=levels, cmap=cmap, transform=crs_proj4)
        
        # Add title and labels for each column
        if i == 0:
            ax.set_title(title, fontsize=13, y=1.105, fontweight='bold')

        ax.text(
            x=0.5, y=1.02, 
            s=f"{viirs_start} to {prediction_date}" if col == 0 else f"{prediction_date}, 12:30 PM", 
            transform=ax.transAxes, 
            ha='center', va='bottom', fontsize=11
        )

        # critical for 
        #ax.set_aspect("auto")

#plt.tight_layout()
plt.show()
No description has been provided for this image

Discussion¶

Clearly, our model does not produce very accurate predictions. Only 8% of the PM2.5 levels are explainable by the the cumulative VIIRS data ($R^2$). However, the model is extremeley naive and the results hint at numerous avenues for improving our predictions.

First, we might add new input features along the spatial dimension. The place to start would probably be meteorological variables (wind, humidity, etc.) and existing VIIRS data at different resolutions (i.e. the raw point data). It also seems promising to incorporate PM2.5 data as inputs to our model, perhaps for a period of time close before our prediction target date.

Second, it seems likely we could improved model accuracy by extending our prediction target along the temporal dimension. Instead of predicting PM2.5 at precisely 12:30 pm on Sundays, we could predict the average PM2.5 value for an entire day (or some other time period). It would also be interesting to see if there are differences in predicting maximum PM2.5 versuss average levels.

InĀ [59]:
from IPython.display import Image

Paper 1¶

No description has been provided for this image

https://www.sciencedirect.com/science/article/pii/S0048969722014565

This paper describes a method of improving the accuracy of PM2.5 estimates in mainland China by augmenting MERRA-2 data with additional meteoroligcal variables. Recent research suggests that MERRA-2 underrates PM2.5 across China, particularly in the Winter and Autumm. The study added fifty-three meteorological factors available from the ERA-5 reanalysis datasets. Each of these datasets was regridded to a 0.5x0.5 degree raster grid, using Krigina and nearest neighbor interpolation. From here, the authors applied a Ligh Gradient Boosting Machine (LGBM) model, which is a decision tree model. Training was conducted on data from 2014 to 2018, and 2019 data was used for testing. Training performance varied significantly, performing much worse in the higher elevation areas to west and north. Generally, this method improved predictive performance over MERRA-2 alone, with significant spatial and temporal variations in performance. Overall, this was a well written and laid-out study that provides a good template for reanalysis studies that utilize machine learning techniques.

No description has been provided for this image No description has been provided for this image

Paper 2¶

No description has been provided for this image

https://ntrs.nasa.gov/api/citations/20220017880/downloads/Office%20Notes%20%2322%20-%20geosaerosoldocumentation%20-%20v2.pdf

This paper provides detailed view of the Aerosol estimates in MERRA-2 data. It is a primary reference in MERRA-2 documentation. PM2.5 is not measured directly by MERRA-2. Rather, it's a variable that is calculated based on estimates of five aerosol species that are measured directly in MERRA-2: black carbon, dust, organic carbon, sea salt, and sulfate. This paper describes a method for combining these measures into a single generalized PM2.5 volumetric estimate. However, it turns out that the density of each of the species changes with ambient relative hudidity. It seems that we would either have to estiamte RH, or obtain this data from other sources. We found a different data source that has already done this reanalysis, with the assumption that RH=0. Interestingly, this new data source comes with a matching set of error estimate for every grid point, suggesting that accuracy is not high.

No description has been provided for this image No description has been provided for this image

Paper 3¶

https://acdisc.gesdisc.eosdis.nasa.gov/data/HAQAST/MERRA2_CNN_HAQAST_PM25.1/doc/README_MERRA2_CNN_HAQAST_PM25_V1.pdf

This is a readme document on the MERRA-2 CNN_HAQAST Bias correction, where they trained a model to calibrate the merra pm2.5 data against ground observations. This is the dataset that we made use of above.

This paper makes use of one of the plot types we used for assessing our model predictions. Interestingly, you can see the original data has very similar qualities to the model that we produced, with large amounts of data falling close to one axis or the other.

InĀ [57]:
Image("Snaps/MERRA_README.png")
Out[57]:
No description has been provided for this image

Paper 4¶

https://arxiv.org/pdf/2312.15796

This paper covers one of Deepminds models targetting weather prediction called GenCast. Their main focus in this paper is to extend previous work in their Graphcast model which minimised RMSE, and thus produced simulations that were smoothed. In this paper they produce ensemble predictions that give more realistic scenarios that have similar micro behavior as observed in the realworld, and can also be used to give distribution predictions.

Our ambitions in this assignment are far away from this of course, but it is good to think about the different types of outputs you may be encouraging a model to produce.

In the snip below you can see the display of multiple samples from their generative weather model, compared to ground truth and ensemble means (and previous model). This displays the benefit of an approach like this quite cleanly.

InĀ [58]:
Image("Snaps/GenCast.png")
Out[58]:
No description has been provided for this image
InĀ [66]:
!jupyter nbconvert --to html Iteration4.ipynb
[NbConvertApp] Converting notebook Iteration4.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 14 image(s).
[NbConvertApp] Writing 9766911 bytes to Iteration4.html